An Analysis of Y-Plantroom

By Joe Brakoniecki

Email: jdb209@cam.ac.uk


This notebook concerns the analysis of data collected from the Y Plantroom from the Belimo Sensors installed in the period of November 2024 - Febuary 2025. From plotting some basic time series and characteristic graphs of the data, I noticed that the Constant Temperature (CT) systems are negligble in comparison to the Variable Temperature (VT) systems, so the remainder of the notebook focuses on modelling the load in the Cripps and Fisher VT systems using the availble data from the Building Managment System (BMS) - Priva.

Constant and Variable Temperature Systems


A constant temperature system is one in which the temperature of the water circulating the system does not change with outside factors, e.g. outside temperature. These systems are used for providing domestic hot water (DHW) as you want the same temperature hot water in all conditions.

A variable temeprature system is one in which the temperature of the water cicualting the system does change with outside factors, e.g. outside temperature. These systems are used for space heating, as different amounts of heat will be required to heat a room to a constant temperature depending on external factors.

Data Sources


We have data from Belimo sensors in the following locations:

  • Fisher VT (1.1)
  • Fisher CT (1.3)
  • Cripps VT (1.2)
  • Cripps CT (1.4)
  • MCC1 (1.5) - Not in the scope of this analysis

Each Belimo Sensor Measures: Flow Temperature, Return Temperature, Flow Rate, Flow Velocity in 30 second intervals and uses these metrics to calculate the Heating Load on each system $ ( P = m c T )$. Each group of sensors also has one outside temperature sensor.

The data from these Belimo Sensors was then supplmented by the following BMS readings from Priva:

  • Fisher VT Flow Temperature
  • Cripps VT Flow Temperature
  • Fisher Boiler Room Outside Temperature

The BMS measures the above data points in 8 minute intervals.

Code
# Install requirements
import pandas as pd
import plotly.offline as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import linregress
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import root_mean_squared_error, r2_score, mean_absolute_percentage_error
from sklearn.preprocessing import StandardScaler
import plotly.io as pio
pio.renderers.default = "notebook_connected"
Code
# Fetch the Cleaned Belimo Data (run Clean_Belimo_Data.py to produce the file)

# Create Key for 1_1, 1_2, ect codes
legend_lookup = {
    "1_1" : "Fisher VT",
    "1_2" : "Cripps VT",
    "1_3" : "Cripps New CT",
    "1_4" : "Cripps Old CT",
}

data = pd.read_parquet("BelimoData/Combined.parquet")

data_12hr = data.resample("12h").mean()

Analysis of Load By Heating System


Code
fig1 = go.Figure()
for i in data_12hr.columns:
    if i != "1_5" and i!= "Total P" and i.split(" ")[-1] != "Outside" and i.split(" ")[-1] != "Flow":
        fig1.add_trace(go.Scatter(
            name = legend_lookup[i],
            x = data_12hr.index,
            y = data_12hr[i],
            stackgroup='one'
           ))

fig1.add_trace(go.Scatter(
            name = "Total Load",
            x = data_12hr.index,
            y = data_12hr["Total P"],
           ))

fig1.update_layout(
    title=dict(
        text="Y-Plantroom Load by Heating System"
    ),
    xaxis=dict(
        title=dict(
            text="Timestamp"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load - 12 hour Average (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Schematic Position"
        )
    )
)
fig1.show()

The above graph shows how the total load of the Y-plantroom, averaged over 12 hour intervals, varies with time and how it breaks down into individual heating systems.

Key observation from this graph include:

  • Cripps New CT’s load is on the order of about ~80 kW peaking at ~ 186 kW, this is signficiantly less any other of the systems, being ~2x less than the VT systems at its nominal value
  • Cripps Old CT’s load is on the order of about ~30 kW peaking at ~47 kW, about ~10x less than either VT system
  • Fisher VT’s load is on the order of about ~150 kW peaking at ~242 kW
  • Cripps VT’s load is on the order of about ~130 kW peaking at ~147 kW

THus we can see that the CT systems hold a smaller share of the load (~ 120 kW) compared to the (~ 389 kW) for the VT systems, thus if we are looking to make the most significant optimisation opportuntites we should start with them.

It is also important to note that the plantroom has a 2 MW capacity but this chart shows our total load as never surpassing 500 kW so the boilers are ~4x oversized.

This is the reason the majority of this notebook focuses on them.

Load-Temperature Characteristic By Heating System


Code
data_6hr = data.resample("6h").mean()

fig2 = go.Figure()

for i in data.columns:
    if i != "1_5" and i.split(" ")[-1] != "Outside" and i.split(" ")[-1] != "Flow" and i != "Total P":
        fig2.add_trace(go.Scatter(
            name = legend_lookup[i],
            x = data_6hr[i + " T Outside"],
            y = data_6hr[i],
            mode = "markers"
           ))

fig2.update_layout(
    title=dict(
        text="Temperature-Load Characterists by System"
    ),
    xaxis=dict(
        title=dict(
            text="Outside Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load - 6 hour Average (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Schematic Position"
        )
    )
)
fig2.show()

The above plot shows how the 6 hour average load varies with the outside temperature.

We can clearly see the following from this plot: - Both VT systems show a negative correlation between load and outside temperature - Both CT systems show no correlation between load and outside temperature

This is what we would expect as increasing outside temperature will result in the ambient space temperature increasing and thus the heating system will have to do less work to maintain a consistant temperature, thus resulting in a lower load.

Since there are negligble losses, the CT system will have no dependance on outside temperature, as the amount of heat required, and thus the load, is a function of demand only.

Heating Curve


Code
data_hourly = data.resample("1h").mean()


fig12 = go.Figure()

fig12.add_trace(go.Scatter(
            name =  "Cripps VT",
            x = data_hourly["1_2" + " T Outside"],
            y = data_hourly["1_2" + " T Flow"],
            mode = "markers"
           ))

fig12.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_hourly["1_1" + " T Outside"],
            y = data_hourly["1_1" + " T Flow"],
            mode = "markers"
           ))

fig12.update_layout(
    title=dict(
        text="Heating Curves for VT Systems (Hourly Average)"
    ),
    xaxis=dict(
        title=dict(
            text="Outside Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Flow Temperature (°C)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig12.show()

A Heating Curve is a graph which describes the VT supply temperature as a function of the ambient temperature.

image.png

For each system you should expect to have 2 heating curves that look similar to the picture above. One heating curve would be for during the day and the other for during the night. The heating curve should plataue at high temperature as the temperature at which the boiler can heat water to has an upper limit.

These 2 heating curves are visisble in the Fisher VT data, as shown by the parralell lines. However in the cripps there is much more noise, with one line faintly visible through the middle. This line corresponds to the moments at which the load is controlled by the weather conditions. There are 2 fixed lines at 30°C and 65°C, these must correspond to when the BMS has taken control and altered the load manually. In these regions the system is operating as a CT system rarther than a VT system, which is in theory less energy efficient and so is worth some investigation.

Comparing Equivilent BMS and Belimo Readings


Code
priva_data = pd.read_parquet("Y Plantroom/data.parquet")
priva_data.index = pd.to_datetime(priva_data.index, utc=True)

data_8min = data.loc[data.index > pd.to_datetime(
        "2024-11-18T00:00:00", format="%Y-%m-%dT%H:%M:%S", utc=True
    )]

data_8min = data_8min.resample("8min").mean()

data_8min = data_8min.merge(priva_data, on="timestamp", how = "outer")

data_8min.dropna(inplace = True)
data_sample = data_8min.resample("12h").mean()
Code
fig3 = make_subplots(specs=[[{"secondary_y": True}]])

fig3.add_trace(
    go.Scatter(x=data_sample.index, y=data_sample["Fisher VT Flow-temperature"], name="Fisher BMS"),
    secondary_y=False
)

fig3.add_trace(
    go.Scatter(x=data_sample.index, y=data_sample["1_1 T Flow"], name="Fisher Belimo"),
    secondary_y=False
)

fig3.add_trace(
    go.Scatter(x=data_sample.index, y=data_sample["Cripps VT Flow-temperature"], name="Cripps BMS"),
    secondary_y=False
)

fig3.add_trace(
    go.Scatter(x=data_sample.index, y=data_sample["1_2 T Flow"], name="Cripps Belimo"),
    secondary_y=False
)

fig3.update_layout(
    title=dict(
        text="Belimo and BMS Flow Temperature Sensors plotted against time"
    ),
    xaxis=dict(
        title=dict(
            text="Timestamp"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Flow Temperature (°C)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)

fig3.show()

The above time series plot compares the Belimo Flow temperature results with the BMS results. We can see that in oth cases the curves follow each other nicely, having the same peaks and troughs at the same times. The belimo data does deviate slightly from the BMS data, holding a lower value at all times.

I would guess this is due to differences in sensor placement but requires further investigation.

Code
cripps = linregress(x = data_8min["Cripps VT Flow-temperature"],y = data_8min["1_2 T Flow"])
fisher = linregress(x = data_8min["Fisher VT Flow-temperature"],y = data_8min["1_1 T Flow"])

print(f"""
Cripps VT regression:
Gradient = {cripps.slope}
Intercept = {cripps.intercept}
R^2 Value = {cripps.rvalue **2}
---""")

print(f"""Fisher VT regression:
Gradient = {fisher.slope}
Intercept = {fisher.intercept}
R^2 Value = {fisher.rvalue **2}
""")

Cripps VT regression:
Gradient = 0.8936953057485365
Intercept = 0.13989488679636963
R^2 Value = 0.8866651292889104
---
Fisher VT regression:
Gradient = 0.8248076814080582
Intercept = 8.832884583333602
R^2 Value = 0.8450331907517058
Code
x = np.arange(30,73, 2)

fig4 = go.Figure()

fig4.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_sample["Fisher VT Flow-temperature"],
            y = data_sample["1_1 T Flow"],
            mode = "markers",
            line=dict(color="#00CC96")
           ))

fig4.add_trace(go.Scatter(
            name = "Fisher VT",
            x = x,
            y = fisher.slope * x + fisher.intercept,
            mode = "lines",
            line=dict(color="#00CC96")
           ))

fig4.add_trace(go.Scatter(
            name = "Cripps VT",
            x = data_sample["Cripps VT Flow-temperature"],
            y = data_sample["1_2 T Flow"],
            mode = "markers",
            line=dict(color="#AB63FA")
           ))

fig4.add_trace(go.Scatter(
            name = "Cripps VT",
            x = x,
            y = cripps.slope * x + cripps.intercept,
            mode = "lines",
            line=dict(color="#AB63FA")
           ))


fig4.update_layout(
    title=dict(
        text="Y Plantroom Belimo vs BMS Flow Temperature"
    ),
    xaxis=dict(
        title=dict(
            text="BMS Flow Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Belimo Reading (°C)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig4.show()

The above graph and regression summaries show a clear linear realtionship between the belimo reading and flow temperature. There are some anomolous values which distort the regression resulting in the lines deviting from the y=x curve we would expect.

I remove these anomlies to produce the following regressions:

Code
# Remove anamolous Values and repeat regression
data_8min1 = data_8min.loc[data_8min["1_1 T Flow"] != 65.5]
data_8min1 = data_8min1.loc[data_8min1["1_2 T Flow"] != 29.67]
data_8min1 = data_8min1.loc[data_8min1["1_1 T Flow"] != 66.28]
data_8min1 = data_8min1.loc[data_8min1["1_2 T Flow"] != 29.67]

data_sample1 = data_8min.resample("12h").mean()

cripps = linregress(x = data_8min1["Cripps VT Flow-temperature"],y = data_8min1["1_2 T Flow"])
fisher = linregress(x = data_8min1["Fisher VT Flow-temperature"],y = data_8min1["1_1 T Flow"])

print("Improved Regressions:")

print(f"""---
Cripps VT regression:
Gradient = {cripps.slope}
Intercept = {cripps.intercept}
R^2 Value = {cripps.rvalue **2}""")

print(f"""---
Fisher VT regression:
Gradient = {fisher.slope}
Intercept = {fisher.intercept}
R^2 Value = {fisher.rvalue **2}""")
Improved Regressions:
---
Cripps VT regression:
Gradient = 0.917000487880897
Intercept = -0.6607158776092064
R^2 Value = 0.9854920505667721
---
Fisher VT regression:
Gradient = 0.923735768690665
Intercept = 2.2029214958053274
R^2 Value = 0.9931762017477542
Code
x = np.arange(30,73, 2)

fig5 = go.Figure()

fig5.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_sample1["Fisher VT Flow-temperature"],
            y = data_sample1["1_1 T Flow"],
            mode = "markers",
            line=dict(color="#00CC96")
           ))

fig5.add_trace(go.Scatter(
            name = "Fisher VT",
            x = x,
            y = fisher.slope * x + fisher.intercept,
            mode = "lines",
            line=dict(color="#00CC96")
           ))

fig5.add_trace(go.Scatter(
            name = "Cripps VT",
            x = data_sample1["Cripps VT Flow-temperature"],
            y = data_sample1["1_2 T Flow"],
            mode = "markers",
            line=dict(color="#AB63FA")
           ))

fig5.add_trace(go.Scatter(
            name = "Cripps VT",
            x = x,
            y = cripps.slope * x + cripps.intercept,
            mode = "lines",
            line=dict(color="#AB63FA")
           ))


fig5.update_layout(
    title=dict(
        text="Y Plantroom Belimo vs BMS sensors"
    ),
    xaxis=dict(
        title=dict(
            text="BMS Flow Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Belimo Reading (°C)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig5.show()

We can see that removing these anomolous values results in a much tighter fit to to the data, with greater \(R^2\) values and gradients closer to the 1 we expect. Both still have larger deviations from 1 and 0 in the gradients and intercepts, which I belive is a followthrough of the error we obersved in the time series.

This deviation requires further investigation as to why it is present.

The strong correlation between the belimo and BMS flow readings indicate that there is consistancy between the 2 data sources and thus usins BMS data to model the load as measured by Belimo is appropriate.

Creating a Linear Model for Load


This section follows the production of a linear model for the steady state load in terms of BMS Flow Temperature and Outside Temperature readings. I am using the load calculated from Belimo sensors to train the model, in concordance with this BMS data.

The model I have applied is a basic linear regression. A linear regression using an Ordinary Least Squares (OLS) method.

An Aside on Linear Regressions


A linear model takes the form of:

\[ y = m_1x_1 + m_2x_2 + ... + m_ix_i + c + \varepsilon \]

Where \(x_i\) is an input variable, \(m_i\) is its respective coefficient, \(c\) is a static offset/intercept, \(\varepsilon\) is the error, and \(y\) is the output.

For clarity we can represent this in matrix form:

\[ y = \begin{bmatrix} m_1 & 0 & ... & 0\\ 0 & m_2 & ...& 0\\ \vdots & \vdots & \ddots &0\\ 0 & 0 & ... &m_i\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\\ x_i \end{bmatrix} + c + \varepsilon = M \textbf{x} + c + \varepsilon \]

An OLS regression works by taking an input of a dataset of known values of \(\textbf{x}\) and their respective \(y\). It then uses a variety of mathmatical techniques (explored in this wikipedia article) to find the values of \(c\) and \(M\) which minimise the square of the residuals (difference between observed and predicted values). We square the residual to ensure that the positive and negative errors do not cancel one another out, additionaly squaring errors gives more weight to larger errors, resulting in the model reducing bigger mistakes.

In the diagram below the white lines are the residuals, the red points are the training dataset and the black line is the model. The best model is when the length of all the white lines is minimised.

image.png

More information into OLS models and applying them in python can be found here

Quantifing model performance

To quantify how good our model is I will use the following metrics: - \(R^2\) or the coefficient of determination. It is the proportion of variation in the dependent variable that is predictable from the independent variables - i.e. how well the model responds to nuance in the data. 1 indicates a perfect fit and 0 indicates that no variablity in the dependant varaible is explained by the model. - RMSE or Root mean squared error is an indication of the average residual between the model and the real value. It gives us an indication of the average absolute uncertainty in the model. Since we are squaring all of the residuals this metric is sensitive to large outliers in the data. - MAPE or Mean Absoulte Percentage Error is another indication of the average percentage uncertainty in the model. Instead of squaring the residuals this function takes the absoulte value and so is less sensitive to outliers.

More information about these metrics can be found here A usefule guid on interpreting \(R^2\) can be found here

Code
fig6 = go.Figure()

fig6.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_sample["Fisher VT Flow-temperature"],
            y = data_sample["1_1"],
            mode = "markers",
            line=dict(color="#00CC96")
           ))

fig6.add_trace(go.Scatter(
            name = "Cripps VT",
            x = data_sample["Cripps VT Flow-temperature"],
            y = data_sample["1_2"],
            mode = "markers",
            line=dict(color="#AB63FA")
           ))

fig6.update_layout(
    title=dict(
        text="Belimo Load vs BMS Flow Tempreature Sensors"
    ),
    xaxis=dict(
        title=dict(
            text="BMS Flow Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load from Belimo (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig6.show()
Code
fig7 = go.Figure()

fig7.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_sample["Outside-temperature"],
            y = data_sample["1_1"],
            mode = "markers",
            line=dict(color="#00CC96")
           ))

fig7.add_trace(go.Scatter(
            name = "Cripps VT",
            x = data_sample["Outside-temperature"],
            y = data_sample["1_2"],
            mode = "markers",
            line=dict(color="#AB63FA")
           ))

fig7.update_layout(
    title=dict(
        text="Belimo Load vs BMS Outdoor Sensor"
    ),
    xaxis=dict(
        title=dict(
            text="BMS Outside Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load from Belimo (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig7.show()
Code
print(f"""Correlation Coefficinets for: Belimo Load vs BMS Outdoor Sensor:
Fisher : {data_8min["1_1"].corr(data_8min["Outside-temperature"])}
Cripps : {data_8min["1_2"].corr(data_8min["Outside-temperature"])}
---
Correlation Coefficinets for: Belimo Load vs BMS Outdoor Sensor:
Fisher : {data_8min["1_1"].corr(data_8min["Fisher VT Flow-temperature"])}
Cripps : {data_8min["1_2"].corr(data_8min["Cripps VT Flow-temperature"])}""")
Correlation Coefficinets for: Belimo Load vs BMS Outdoor Sensor:
Fisher : -0.3835503667306269
Cripps : -0.19819024335597296
---
Correlation Coefficinets for: Belimo Load vs BMS Outdoor Sensor:
Fisher : 0.6910877534508774
Cripps : 0.666072928828072

The above graphs and correlation coefficients show that there is a pretty strong positive correlation between the flow temperature and load, with correlation coeefcients of ~0.7. The correaltion between outside temperature and load much weaker but still present with smaller magnitude negative correlation coefficients.

I believe that there is enough of a correlation between these 2 variables and load to perform a linear regressionuse the former to predict the latter, thus these are the 2 variables I will choose to use.

Code
fig8 = go.Figure()

fig8.add_trace(go.Scatter3d(
            name = "Fisher VT",
            x = data_8min["Outside-temperature"],
            z = data_8min["1_1"],
            y = data_8min["Fisher VT Flow-temperature"],
            mode = "markers",
            line=dict(color="#00CC96"),
            marker_size=1
           ))

fig8.add_trace(go.Scatter3d(
            name = "Cripps VT",
            x = data_8min["Outside-temperature"],
            z = data_8min["1_2"],
            y = data_8min["Cripps VT Flow-temperature"],
            mode = "markers",
            line=dict(color="#AB63FA"),
            marker_size=1
           ))

fig8.update_layout(
    title=dict(
        text="Load vs Outdoor Temperature vs Flow Temperature"
    ),
    scene = dict(
    xaxis=dict(
        title=dict(
            text="Outside Temperature (°C)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Flow temperature (°C)"
        )
    ),
    zaxis=dict(
        title=dict(
            text="Load (W)"
        )
    )),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig8.show()

From the above plot we can see a trend between the 3 variables, looking at the graph from the top down we can clearly see the heating curves used to define the systems. There is, however, a lot of variation in the data, this is because the boiler cycles on and off in daily or bidaily cycles. During the cycling on and off, we end up with peaks as the boiler start up, steady state regions of high ~constant load, low power regions and 0 power (off) regions. These are annotated on the graph below:

image.png

For the scope of this document we are only interested in modelling the steady-state power, and since the other regions occur at regularly repeating time intervals we can easily remove them from the dataset before we train the model to improve results.

Fisher Model

Code
#https://www.digitalocean.com/community/tutorials/multiple-linear-regression-python#preprocess-the-data
data_toUse = data_8min.loc[data_8min.index.time < pd.to_datetime(
        "10:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse = data_toUse.loc[data_toUse.index.time > pd.to_datetime(
        "06:30:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse2 = data_8min.loc[data_8min.index.time > pd.to_datetime(
        "17:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse2 = data_toUse2.loc[data_toUse2.index.time < pd.to_datetime(
        "22:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse = pd.concat([data_toUse, data_toUse2])

data_toUse = data_toUse.loc[data_toUse["1_1"] != 184e3]
data_toUse = data_toUse.loc[data_toUse["1_1"] != 141.229e3]

scaler = StandardScaler()

XF = data_toUse[["Fisher VT Flow-temperature", "Outside-temperature"]]

yF = data_toUse["1_1"]

XF_scaled = scaler.fit_transform(XF)

XF_train, XF_test, yF_train, yF_test = train_test_split(XF_scaled, yF, test_size=0.2, random_state=42)

regrF = LinearRegression()

regrF.fit(XF_train, yF_train)

yF_pred = regrF.predict(XF_test)

print("RMSE:", root_mean_squared_error(yF_test, yF_pred), "W")
print("R-squared:", r2_score(yF_test, yF_pred))
print("MAPE:", mean_absolute_percentage_error(yF_test, yF_pred))
RMSE: 18895.998508006895 W
R-squared: 0.6324670244573225
MAPE: 0.07985614765220582
Code
fig9 = go.Figure()

fig9.add_trace(go.Scatter3d(
            name = "Fisher VT Model",
            x = list(zip(*XF_test))[0],
            z = yF_pred,
            y = list(zip(*XF_test))[1],
            mode = "markers",
            line=dict(color="#AB63FA"),
            marker_size=1
           ))

fig9.add_trace(go.Scatter3d(
            name = "Fisher VT",
            x = list(zip(*XF_test))[0],
            z = yF_test,
            y = list(zip(*XF_test))[1],
            mode = "markers",
            line=dict(color="#00CC96"),
            marker_size=1
           ))


fig9.update_layout(
    title=dict(
        text="Load vs Normalized Outdoor Temperature vs Normalized Flow Temperature for the Testing Dataset"
    ),
    scene = dict(
    xaxis=dict(
        title=dict(
            text="Outside Temperature (norm.)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Flow temperature (norm.)"
        )
    ),
    zaxis=dict(
        title=dict(
            text="Load (W)"
        )
    )),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig9.show()
Code
fig10 = go.Figure()

fig10.add_trace(go.Scatter(
            name = "Fisher VT",
            x = data_8min.index,
            y = data_8min["1_1"],
            line=dict(color="#00CC96")
           ))

fig10.add_trace(go.Scatter(
            name = "Training + Testing Dataset",
            x = data_toUse.index,
            y = data_toUse["1_1"],
            mode = 'markers'
           ))

fig10.add_trace(go.Scatter(
            name = "Predicted Values",
            x = data_8min.index,
            y = regrF.predict(scaler.fit_transform(data_8min[["Fisher VT Flow-temperature", "Outside-temperature"]])),
        line=dict(color="#AB63FA")
           ))

fig10.update_layout(
    title=dict(
        text="Fisher VT Load vs Time"
    ),
    xaxis=dict(
        title=dict(
            text="Timestamp"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig10.show()

We can see from the above graphs that the model predicts data in the region we would expect, being a pretty accurate line of best fit through the data. The time series shows that it is predicting the correct order of magnitude of values, hovering around the steady state value for the entire data set.

Looking at the performance statistics of the model, we can see that \(R^2 = 0.63\), a MAPE of \(\pm 8%\) and a RMSE of \(\pm 19\) kW. These indicate that approximatly 63% of the variation of the raw data can be explained by the model. Since the data is on the order of ~200 kW, the \(\pm 8%\) and \(\pm 19\) kW are consistant and indicate that the model is accurate to <10%.

Cripps Model

Code
#https://www.digitalocean.com/community/tutorials/multiple-linear-regression-python#preprocess-the-data
data_toUse = data_8min.loc[data_8min.index.time < pd.to_datetime(
        "10:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse = data_toUse.loc[data_toUse.index.time > pd.to_datetime(
        "07:30:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse2 = data_8min.loc[data_8min.index.time > pd.to_datetime(
        "17:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse2 = data_toUse2.loc[data_toUse2.index.time < pd.to_datetime(
        "22:00:00", format="%H:%M:%S", utc=True
    ).time()]

data_toUse = pd.concat([data_toUse, data_toUse2])

data_toUse = data_toUse.loc[data_toUse["1_2"] != 9e3]
data_toUse = data_toUse.loc[data_toUse["1_2"] != 0]
data_toUse = data_toUse.loc[data_toUse["1_1"] != 184e3]
data_toUse = data_toUse.loc[data_toUse["1_1"] != 141.229e3]

XC = data_toUse[["Cripps VT Flow-temperature", "Outside-temperature"]]

yC = data_toUse["1_2"]

XC_scaled = scaler.fit_transform(XC)

XC_train, XC_test, yC_train, yC_test = train_test_split(XC_scaled, yC, test_size=0.2, random_state=42)

regrC = LinearRegression()

regrC.fit(XC_train, yC_train)

yC_pred = regrC.predict(XC_test)

print("RMSE:", root_mean_squared_error(yC_test, yC_pred), "W")
print("R-squared:", r2_score(yC_test, yC_pred))
print("MAPE:", mean_absolute_percentage_error(yC_test, yC_pred))
RMSE: 11638.47340953614 W
R-squared: 0.7353293896184502
MAPE: 0.07669892433046585
Code
fig11 = go.Figure()

fig11.add_trace(go.Scatter3d(
            name = "Cripps VT Model",
            x = list(zip(*XC_test))[0],
            z = yC_pred,
            y = list(zip(*XC_test))[1],
            mode = "markers",
            line=dict(color="#AB63FA"),
            marker_size=1
           ))

fig11.add_trace(go.Scatter3d(
            name = "Cripps VT",
            x = list(zip(*XC_test))[0],
            z = yC_test,
            y = list(zip(*XC_test))[1],
            mode = "markers",
            line=dict(color="#00CC96"),
            marker_size=1
           ))


fig11.update_layout(
    title=dict(
        text="Load vs Normalized Outdoor Temperature vs Normalized Flow Temperature for the Testing Dataset"
    ),
    scene = dict(
    xaxis=dict(
        title=dict(
            text="Outside Temperature (norm.)"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Flow temperature (norm.)"
        )
    ),
    zaxis=dict(
        title=dict(
            text="Load (W)"
        )
    )),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig11.show()
Code
fig12 = go.Figure()

fig12.add_trace(go.Scatter(
            name = "Cripps VT",
            x = data_8min.index,
            y = data_8min["1_2"],
            line=dict(color="#00CC96")
           ))

fig12.add_trace(go.Scatter(
            name = "Training + Testing Dataset",
            x = data_toUse.index,
            y = data_toUse["1_2"],
            mode = 'markers'
           ))

fig12.add_trace(go.Scatter(
            name = "Predicted Values",
            x = data_8min.index,
            y = regrC.predict(scaler.fit_transform(data_8min[["Cripps VT Flow-temperature", "Outside-temperature"]])),
            line=dict(color="#AB63FA")
           ))

fig12.update_layout(
    title=dict(
        text="Cripps VT Load vs Time"
    ),
    xaxis=dict(
        title=dict(
            text="Timestamp"
        )
    ),
    yaxis=dict(
        title=dict(
            text="Load (W)"
        )
    ),
    legend=dict(
        title=dict(
            text="Series"
        )
    )
)
fig12.show()

We can see from the above graphs that the model predicts data in the region we would expect, being a pretty accurate line of best fit through the data. The time series shows that it is predicting the correct order of magnitude of values, hovering around the steady state value for the entire data set. Although it appears from both the teim series ands catter graph to be slighlty on the low end of training dtaset.

Looking at the performance statistics of the model, we can see that \(R^2 = 0.74\), a MAPE of \(\pm 8%\) and a RMSE of \(\pm 12\) kW. This indicates that about 74% of the variation of the raw data can be explained by the model. Since the data is on the order of ~150 kW, the \(\pm 8%\) and \(\pm 11\) kW are consistant and indicate that the model is accurate to <10%.

Conclusions


  1. The CT systems compose a neglgible part of the total load of the Y-Plantroom
  2. The data from the BMS is consistant with the Belimo Data, although an offset is present this is likely due to callibration differences
  3. We can use data from the BMS to create a linear model for load so that predictions can be made using sensors that we already have installed in the system
  4. Both models are accurate to about \(\pm 10%\), this indicates that the models are likely accurate enough to provide an indication of the current load of each system but not precise enough for precise configuration of the systems.

Next Steps


  • Investigate more stastical means to test the models performance
  • Create models for the low power and peak power regions
  • Use the current models to forcast the load on each plantroom into the time periods where we no longer have Belimo Data
  • Compare maintanence records with the belimo data to find records of previous faults, compare these times to the Belimo and BMS data to identify what a fault looks like in the data provided. And then use this data to produce a model to predict faults going forward.
  • Use the Heating Curve and Load-Temperature characterisitc to determine when it is the BMS and when it is the weather that is causig the load.